Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix phonetics for 俄 #535

Merged
merged 1 commit into from
Oct 18, 2024
Merged

Fix phonetics for 俄 #535

merged 1 commit into from
Oct 18, 2024

Conversation

xatier
Copy link
Contributor

@xatier xatier commented Sep 26, 2024

The concise dictionary prefers ㄜˊ, we provide both for fault tolerance.

https://dict.concised.moe.edu.tw/dictView.jsp?ID=39379&q=1

俄頃 and 俄而 are the exceptions, as they have different meanings.

@lukhnos
Copy link
Contributor

lukhnos commented Oct 16, 2024

@xatier I'm more reluctant accepting this PR and #530. I think it comes to a few things. There's the diminishing return of this type of efforts in the name of fault tolerance—are there evidence that this really helps a Taiwanese Mandarin user?

I understand that "this is what the official dictionary says", but anecdotally reading 俄 as ㄜˋ is very typically Taiwanese Mandarin whereas 俄 as ㄜˊ is clearly not. #530 is similar.

@xatier
Copy link
Contributor Author

xatier commented Oct 16, 2024

I do not have a strong preference on having these proposed pronunciations for these, either. What I do notice is that we already have a fixed combination from both sides (the concise dictionary form vs. common form) for many characters, providing consistency and fault tolerance is my goal for these PRs. #530 is a great example of this, the current dictionary file is highly inconsistent.

As all my previous PRs, feel free to close these them if you find them not providing value for 小麥 users.

@lukhnos lukhnos mentioned this pull request Oct 17, 2024
@xatier
Copy link
Contributor Author

xatier commented Oct 17, 2024

IMHO, I see 亞 and 俄 are very similar to #522, I would like to have both pronunciations if we find them are used with equal frequency. Another approach is to ensure all terms have ㄧㄚˇ and ㄜˋ, but NOT add the concise dictionary forms (ㄧㄚˋ and ㄜˊ) to them while keeping the existing ones without deletions.

We can also review them case-by-case, just let me know which terms to exclude.

Please let me know which approach you would prefer, and we can start from there. :)

@lukhnos
Copy link
Contributor

lukhnos commented Oct 17, 2024

Another approach is to ensure all terms have ㄧㄚˇ and ㄜˋ, but NOT add the concise dictionary forms (ㄧㄚˋ and ㄜˊ) to them while keeping the existing ones without deletions.

I like the idea, but let's be a bit more nuanced and conservative—for 亞 I'm comfortable with adding the missing ㄧㄚˇ to the entries that only have the ㄧㄚˋ reading. For 俄 I suggest you only add ㄜˋ to entries that are related to 俄國、俄語系 etc., that is when it's used in a proper name. The thing is that Classical Chinese trems such as 俄頃 are never read with the fourth tone even in Taiwanese Mandarin.

@tianjianjiang
Copy link
Member

Another approach is to ensure all terms have ㄧㄚˇ and ㄜˋ, but NOT add the concise dictionary forms (ㄧㄚˋ and ㄜˊ) to them while keeping the existing ones without deletions.

I like the idea, but let's be a bit more nuanced and conservative—for 亞 I'm comfortable with adding the missing ㄧㄚˇ to the entries that only have the ㄧㄚˋ reading. For 俄 I suggest you only add ㄜˋ to entries that are related to 俄國、俄語系 etc., that is when it's used in a proper name. The thing is that Classical Chinese trems such as 俄頃 are never read with the fourth tone even in Taiwanese Mandarin.

My thoughts are similar to @lukhnos', and MOE dict actually mentions ㄧㄚˇ and ㄜˋ, especially when ㄜˋ is indeed only for 俄國.

@xatier
Copy link
Contributor Author

xatier commented Oct 18, 2024

My thoughts are similar to @lukhnos', and MOE dict actually mentions ㄧㄚˇ and ㄜˋ, especially when ㄜˋ is indeed only for 俄國.

I believe this page does prefer ㄜˊ for Russia, though.

https://dict.concised.moe.edu.tw/dictView.jsp?ID=39379&q=1

@tianjianjiang
Copy link
Member

My thoughts are similar to @lukhnos', and MOE dict actually mentions ㄧㄚˇ and ㄜˋ, especially when ㄜˋ is indeed only for 俄國.

I believe this page does prefer ㄜˊ for Russia, though.

https://dict.concised.moe.edu.tw/dictView.jsp?ID=39379&q=1

I don't disagree. Just saying that it was ㄜˋ, see https://dict.revised.moe.edu.tw/dictView.jsp?ID=10266&q=1&word=%E4%BF%84.
I suspect it's the influence from Beijing Mandarin. Another example is 法蘭西 vs. 法 (國). When I was a kid, it was common that transliterations and their single-character aliases could have different tones in Taiwan Mandarin, but not in Beijing Mandarin.

@xatier
Copy link
Contributor Author

xatier commented Oct 18, 2024

I'm not sure if the MOE is adopting Beijing tones, but same experience for me that all these characters are pronounced differently with what I learnt from schools.

@xatier
Copy link
Contributor Author

xatier commented Oct 18, 2024

I've updated this PR to include only the additions of ㄜˋ.

Here's the current state of the dictionary:

 $ ./find.py 俄 ㄜˊ
俄亥俄 ㄜˊ ㄏㄞˋ ㄜˊ       
俄人 ㄜˊ ㄖㄣˊ
俄共 ㄜˊ ㄍㄨㄥˋ
俄國 ㄜˊ ㄍㄨㄛˊ
俄國人 ㄜˊ ㄍㄨㄛˊ ㄖㄣˊ
俄式 ㄜˊ ㄕˋ
俄文 ㄜˊ ㄨㄣˊ
俄文系 ㄜˊ ㄨㄣˊ ㄒㄧˋ
俄文組 ㄜˊ ㄨㄣˊ ㄗㄨˇ
俄羅斯 ㄜˊ ㄌㄨㄛˊ ㄙ  
俄羅斯人 ㄜˊ ㄌㄨㄛˊ ㄙ ㄖㄣˊ
俄而 ㄜˊ ㄦˊ                 
俄語 ㄜˊ ㄩˇ                                                                        
俄語系 ㄜˊ ㄩˇ ㄒㄧˋ                  
俄語組 ㄜˊ ㄩˇ ㄗㄨˇ         
俄軍 ㄜˊ ㄐㄩㄣ
俄頃 ㄜˊ ㄑㄧㄥˇ    
帝俄 ㄉㄧˋ ㄜˊ      
帝俄時代 ㄉㄧˋ ㄜˊ ㄕˊ ㄉㄞˋ
懷俄明州 ㄏㄨㄞˊ ㄜˊ ㄇㄧㄥˊ ㄓㄡ
日俄戰爭 ㄖˋ ㄜˊ ㄓㄢˋ ㄓㄥ
沙俄 ㄕㄚ ㄜˊ               


$ ./find.py 俄 ㄜˋ
中俄 ㄓㄨㄥ ㄜˋ
俄亥俄 ㄜˋ ㄏㄞˋ ㄜˋ
俄亥俄州 ㄜˋ ㄏㄞˋ ㄜˋ ㄓㄡ
俄人 ㄜˋ ㄖㄣˊ
俄共 ㄜˋ ㄍㄨㄥˋ
俄國 ㄜˋ ㄍㄨㄛˊ
俄國人 ㄜˋ ㄍㄨㄛˊ ㄖㄣˊ
俄式 ㄜˋ ㄕˋ
俄文 ㄜˋ ㄨㄣˊ
俄文系 ㄜˋ ㄨㄣˊ ㄒㄧˋ
俄文組 ㄜˋ ㄨㄣˊ ㄗㄨˇ
俄新網 ㄜˋ ㄒㄧㄣ ㄨㄤˇ
俄羅斯 ㄜˋ ㄌㄨㄛˊ ㄙ
俄羅斯人 ㄜˋ ㄌㄨㄛˊ ㄙ ㄖㄣˊ
俄羅斯共和國 ㄜˋ ㄌㄨㄛˊ ㄙ ㄍㄨㄥˋ ㄏㄜˊ ㄍㄨㄛˊ
俄羅斯方塊 ㄜˋ ㄌㄨㄛˊ ㄙ ㄈㄤ ㄎㄨㄞˋ
俄羅斯籍 ㄜˋ ㄌㄨㄛˊ ㄙ ㄐㄧˊ
俄語 ㄜˋ ㄩˇ
俄語系 ㄜˋ ㄩˇ ㄒㄧˋ
俄語組 ㄜˋ ㄩˇ ㄗㄨˇ
俄軍 ㄜˋ ㄐㄩㄣ
反共抗俄 ㄈㄢˇ ㄍㄨㄥˋ ㄎㄤˋ ㄜˋ
帝俄 ㄉㄧˋ ㄜˋ
帝俄時代 ㄉㄧˋ ㄜˋ ㄕˊ ㄉㄞˋ
懷俄明州 ㄏㄨㄞˊ ㄜˋ ㄇㄧㄥˊ ㄓㄡ
日俄 ㄖˋ ㄜˋ
日俄戰爭 ㄖˋ ㄜˋ ㄓㄢˋ ㄓㄥ
歐俄 ㄡ ㄜˋ
沙俄 ㄕㄚ ㄜˋ
白俄 ㄅㄞˊ ㄜˋ
白俄羅斯 ㄅㄞˊ ㄜˋ ㄌㄨㄛˊ ㄙ
蘇俄 ㄙㄨ ㄜˋ

We have some ㄜˋ phrases that don't come with ㄜˊ. Let me know if you want me to further update the PR.

Copy link
Contributor

@lukhnos lukhnos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Another approach is to ensure all terms have ㄧㄚˇ and ㄜˋ, but NOT add the concise dictionary forms (ㄧㄚˋ and ㄜˊ) to them while keeping the existing ones without deletions.

I think the PR is good as is, and I think this is the approach we should also use for #530, that is only add the missing variant ㄧㄚˇ entries where there are an existing ㄧㄚˋ.

@lukhnos lukhnos mentioned this pull request Oct 18, 2024
@lukhnos lukhnos merged commit e42f469 into openvanilla:master Oct 18, 2024
1 check passed
@xatier xatier deleted the er branch October 21, 2024 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants